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Abstract 

A universalization of a parameterized investment strategy is an online algorithm whose 
average daily performance approaches that of the strategy operating with the optimal parameters 
determined offline in hindsight. We present a general framework for universalizing investment 
strategies and discuss conditions under which investment strategies are universalizable. We 
present examples of common investment strategies that fit into our framework. The examples 
include both trading strategies that decide positions in individual stocks, and portfolio strategies 
that allocate wealth among multiple stocks. This work extends Cover's universal portfolio work. 
We also discuss the runtime efficiency of universalization algorithms. While a straightforward 
implementation of our algorithms runs in time exponential in the number of parameters, we show 
that the efficient universal portfolio computation technique of Kalai and Vempala involving the 
sampling of log-concave functions can be generalized to other classes of investment strategies. 



1 Introduction 

An age-old question in finance deals with how to manage money on the stock market to obtain an 
"acceptable" return on investment. An investment strategy is an online algorithm that attempts to 
address this question by applying a given set of rules to determine how to invest capital. Typically, 
an investment strategy is parameterized by a vector w £ M* = IJSi ^* that dictates how the 
strategy operates. The optimal parameters that maximize the strategy's return are unknown when 
the algorithm is run and the parameters are usually chosen quite arbitrarily. A universalization of 
an investment strategy is an online algorithm based on the strategy whose average daily perfor- 
mance approaches that of the strategy operating with the optimal parameters determined offline 
in hindsight. 

Consider the constantly rebalanced portfolio (CRP) investment strategy universalized by Cover 
and the subject of several extensions and generalizations P,|6|,pi 13, 15fl. The CRP strategy 



maintains a constant proportion of total wealth in each stock, where the proportions are dictated 
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by the parameters given to the strategy. In a stock market with m stocks, the parameter space for 
the CRP strategy is 



m 



W m = {w G [0, 1]™ | 2^ = 1}, 

i=l 

the set of vectors in IR m whose components are between and 1 and add up to 1. Given a portfolio 
vector w = (wi,... ,w m ) S W m , Wi tells us the proportion of wealth to invest in stock i, for 
1 < i < m. At the beginning of each day, the holdings are rebalanced, i.e., money is taken out of 
some stocks and put into others, so that the desired proportions are maintained in each stock. As 
an example of the robustness of the CRP strategy, consider the following market with two stocks 
pi , 15 1 . The price of one stock remains constant, while the other stock doubles and halves in price 
on alternate days. Investing in a single stock will at most double our money. With a CRP(^, ^) 
strategy, our wealth will increase exponentially, by a factor of • 1 + | - 2) X (| • 1 + 1 • |) = | X | = | 
every two days. 

Cover developed an investment strategy that effectively distributes wealth uniformly over all 
portfolio vectors w £ W m on the first day and executes the CRP strategy with daily rebalancing 
according to each w on the (infinitesimally small) proportion of wealth initially allocated to each 
w. Cover showed that the average daily log-performance^ of such a strategy approaches that of the 
CRP strategy operating with the optimal, return- maximizing parameters chosen with hindsight. 

This paper generalizes previous results and introduces a framework that allows universalizations 
of other parameterized investment strategies. As we see in Section investment strategies fall under 
two categories; trading strategies operate on a single stock and dictate when to buy and shortf] the 
stock; portfolio strategies, such as CRP, operate on the stock market as a whole and dictate how 
to allocate wealth among multiple stocks. We present several examples of common trading and 
portfolio strategies that can be universalized in our framework. We discuss our universalization 
framework in Section ||. The proofs of our results are very general and, as with previous universal 
portfolio results, we make no assumptions on the underlying distribution of the stock prices; our 
results are applicable for all sequences of stock returns and market conditions. The running times 
of universalization algorithms are, in general, exponential in the number of parameters used by the 
underlying investment strategy. Kalai and Vempala [ 13 1 presented an efficient implementation of 
the CRP algorithm that runs in time polynomial in the number of parameters. In Section |], we 
present general conditions on investment strategies under which the universalization algorithm can 
be efficiently implemented. We also give some investment strategies that satisfy these conditions. 
Section |5| concludes with directions for further research. 



2 Types of Investment Strategies 

Suppose we would like to distribute our wealth among m stocks^. Investment strategies are general 
classes of rules that dictate how to invest capital. At time t > 0, a strategy S takes as input an 
environment vector St and a parameter vector w, and returns an investment description Si(w) 

1 The average daily log-performance is the average of the logarithms of the factors by which our wealth changes 
on a daily basis. This notion is discussed further in Sect ion 



5.1 



A short position in a stock, discussed in Section 2.1, allows us to earn a profit when the stock declines in value. 
3 We use the term "stocks" in order to keep our terminology consistent with previous work, but we actually mean 
a broader range of investment instruments, including both long and short positions in stocks. 
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specifying how to allocate our capital at time t. The environment vector St contains historic 
market information, including stock price history, trading volumes, etc.; the parameter vector 
w is independent of £t and specifies exactly how the strategy S should operate; the investment 
description <St(w) = (Sa (w) , . . . , S't m (w)) is a vector specifying the proportion of wealth to put 
in each stock, where we put a fraction Su(w) of our holdings in stock i, for 1 < i < m. For 
example, CRP is an investment strategy; coupled with a portfolio vector w it tells us to "rebalance 
our portfolio on a daily basis according to w"; its investment description, CRPf(w) = w, is 
independent of the market environment £f 

There are two types of investment strategies. Trading strategies tell us whether we should take 
a long (bet that the stock price will rise) or a short (bet that the stock price will fall) position on a 
given stock. Portfolio strategies tell us how to distribute our wealth among various stocks. Trading 
strategies are denoted by T, and portfolio strategies are denoted by P. We use S to denote either 
kind of strategy. For k > 2, let 

k 

>V fc = {w = (w u ... ,w k )<= [0, | J>; = 1}. (1) 

i=i 

Remark 1 is a (k — l)-dimensional simplex in R fc . The investment strategies that we describe 
below are parameterized by vectors in W[ = Wk x • • • x {£ times) for some k > 2 and £ > 1. 
We may write w £ W[ in the form w = (wi, . . . , w^), where w t = (w t i, ■ ■ ■ , w L k) for 1 < i < £. 

2.1 Trading Strategies 

Suppose that our market contains a single stock. We have m = 2 potential investments: either a 
long position or short position in the stock. To take a long position, we buy shares in hopes that 
the share price will rise. We close a long position by selling the shares. The money we use to buy 
the shares is our investment in the long position; the value of the investment is the money we get 
when we close the position. If we let pt denote the stock price at the beginning of day t, the value 
of our investment will change by a factor of xt = from day t to t + 1. 

To take a short position, we borrow shares from our broker and sell them on the market in hopes 
that the share price will fall. We close a short position by buying the shares back and returning 
them to our broker. As collateral for the borrowed shares, our broker has a margin requirement: a 
fraction a of the value of the borrowed shares must be deposited in a margin account. Should the 
price of the security rise sufficiently, the collateral in our margin account will not be enough, and 
the broker will issue a margin call, requiring us to deposit more collateral. The margin requirement 
is our investment in the short position; the value of the investment is the money we get when we 
close the position. 

Lemma 1 Let the margin requirement for a short position be a G (0, 1] . Suppose that a short 
position is opened on day t and that the price of the underlying stock changes by a factor of 
xt = —j^- < 1 + a during the day. Then the value of our investment in the short position changes 
by a factor of x' t = 1 + ±=^t d ur i n g the day. 

Proof: Suppose that we have $v to deposit in the margin account. Using this as our investment 
in the short position, we can sell %v/ot worth of shares. Combining the proceeds of the stock sale 
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with our margin account balance, we will have a total of v + v/a dollars. At the end of the day, it 
will cost xtv/a dollars to buy the shares back, and we will be left with v + ^ — xt^ dollars, which 
is positive since Xt < 1 + a. Thus, our investment of $v in the short position has changed by a 
factor of 1 + as claimed. ■ 

Should the price of the underlying stock change by a factor greater than 1 + a, we will lose 
more money than we initially put in. We will assume that the margin requirement a is sufficiently 
large that the daily price change of the stock is always less than 1 + a. 

Remark 2 This assumption can be eliminated by purchasing a call option on the stock with some 
strike price p < (1 + a)pt- Should the stock price get too high, the call allows us to purchase the 
stock back for $p. Though its price detracts from the performance of our short trading strategy, 
the call protects us from potentially unlimited losses due to rising stock price. 

If a short position is held for several days, assume that it is rebalanced at the beginning of each 
day: either part of the short is closed (if x% > 1) or additional shares are shorted (if Xt < 1) so that 
the collateral in the margin account is exactly an a fraction of the value of the shorted shares. This 
ensures that the value of a short position changes by a factor, x' t = 1 + — ^p, each day. Treating 
short positions in this way, they can simply be viewed as any other stock, so trading strategies are 
effectively investment strategies that decide between two potential investments: a long or a short 
position in a given stock. The investment description of a trading strategy T is T-t = (Tii,!^), 
where Tt\ and are the fraction of wealth to put in a long and short position respectively. 

Remark 3 Let D = Tt\ — Tt2/a be the net long position of the investment description. In practice, 
if D > 0, investors should put a D fraction of their money in the long position and a 1 — D fraction 
in cash; if D < 0, investors should invest D in the short position and 1 — D in cash; if D = 0, 
investors should avoid the stock completely and keep all their money in cash. From a practical 
standpoint, it is desirable for the trading strategy to be decisive, i.e. \D\ = 1, so that our allocation 
of money to the stock is always fully invested in the stock (either as a long or a short position). 
We show in Section [3| that investment strategies that are continuous in their parameter spaces are 
universalizable. Though decisive trading strategies T are discontinuous, the can be approximated 
by continuous startegies whose investment descriptions converge almost everywhere to T t as t — > oo 
(see, for example, (0) below). 



We now describe some commonly used and researched trading strategies p, 1C, 17, 18| and show 
how they can be parameterized. 



MA[fc]: Moving Average Cross-over with fc-day Memory. In traditional applications |T(| 
of this rule, we compare the current stock price with the moving average over, say, the previous 
200 days: if the price is above the moving average, we take a long position, otherwise we take a 
short position. Some generalizations of this rule have been made, where we compare a fast moving 
average (over, for example, the past five to 20 days) with a slow moving average (over the past 
50 to 200 days). We generalize this rule further. Given day t > 0, let v t = (v%\, . . . ,vtk) be the 
price-history vector over the previous k days, where vtj is the stock price on day t — j. Assume 
that the stock prices have been normalized such that < vtj < 1. Let (wp,ws) G W| (where 
Wfc is defined in (|l])) be the weights to compute the fast moving and slow moving averages, so 
these averages on day t are given by wj? • v ( and ws • vt respectively. Since the prices have been 
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normalized to the interval (0, 1], — 1 < {wp — wg) -v t < 1. Let g : [—1, 1] — ► [0, 1] be the long/short 
allocation function. The idea is that g({yvp — ws) • vt) represents the proportion of wealth that we 
invest in a long position. The full investment description for the MA = MA[fc] trading strategy is 

MA t (wj7,w s ) = (g((w F - w s ) ■ vt), 1 - s((wy - w s ) ■ v f ). 

Note that the dimension of the parameter space for MA[fc] is 2(k — 1) since each of wj? and w$ are 
taken from (k — l)-dimensional spaces. Possible functions for g include 



9s(x) 



9(t){x) 




(step function); 



(linear step approximation); 



(2) 



(3) 



and the line 



9e(x) 



x + 1 



(4) 



that intersects g s (x) at the extreme points x = ±1 of its domain. Note that g(t)(x) is parameterized 
by the day t during which it is called and that it converges to g s (x) on [—1, 1] \ {0} as t increases. 

Remark 4 The long/short allocation function used in traditional applications of this rule is the 
step function g s (-). As we see in Section |3|, in order for an investment strategy to be universalizable, 
its allocation function must be continuous, necessitating the continuous approximation <?(t)(-). The 
linear approximation gi{-) can be used with the results of Section ||, to allow for efficient computation 
of the universalization algorithm. 



SR[fc]: Support and Resistance Breakout with fc-day Memory. Discussed as early as 
Wyckoff |L8| in 1910, this strategy uses the idea that the stock price trades in a range bounded 
by support and resistance levels. Should the price fall below the support level, the idea is that 
it will continue to fall and a short position should be taken in the stock. Similarly, should the 
price rise above the resistance level, the idea is that it will continue to rise and a long position 
should be taken in the stock. If the stock price remains between the support and resistance levels, 
the idea is that it will continue to trade in this range in an unpredictable pattern and the stock 
should be avoided. Support and resistance levels are defined quite arbitrarily in practice, usually 
the minimum and maximum prices over the past k days, where k is usually taken to be 50, 150, 
or 200 To generalize this rule, given day t > 0, let y_ t = (v tl , . . . ,v tk ) and v 4 = (vti, ■ ■ ■ ,vtk) 
be the minimum and maximum price histories, where v t j and v t j are the minimum and maximum 
prices over the previous j days, normalized so that they are in the range (0, 1]. Let w G V\4 be 
the weights to compute the support and resistance levels, so these levels on day t are given by 
St = w • \_ t and r t = w • Vf respectively. 

Lemma 2 The support level is bounded above by the resistance level: st < rt- 
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Proof: This follows from the fact that for all 1 < j < k, v t j < vtj. ■ 

The long/short allocation function will be denoted by h : {(x,y) E [— l,l] 2 |x < y} — > [0,1]. 
Let pt be the current stock price (normalized to (0, 1] along with y_t an d Vt). The idea is that 
h{pt — i"t,pt — st) tells us the proportion of wealth that we invest in a long position. The full 
investment description for the SR = SR[fc] trading strategy is 

SR 4 (w) = (h(p t - r t ,p t - s t ), 1 - h(p t - r t ,p t - s t )). 

The value of h need only be defined on {(x,y) E [—1, l] 2 | x < y} since, by Lemma ||, st < r t . A 
possible function for h is 



h s (x,y) 



if x < y < 

if x < < y (step function), (5) 

1 if y > x > 



where the investment allocation long, 1 — ^j-j- = short is equivalent to having no position 
in the stock, since the return from such an allocation is + (1 + 1 ~ ( ^ t = 1- Other possibilities 
include a continuous approximation h^(x,y) to h s (x,y) with maximum slope at most j (defined 
similarly to gn\{x)) (6), or the plane 

, / s (x + l)a y + 1 . , 

hp{x, y) = + (7) 

2{a + 1) 2{a + lj 

that intersects h s (x,y) at the extreme points (x,y) = (—1,-1), (—1,1), and (1,1) of its domain. 



2.2 Portfolio Strategies 

Portfolio strategies are investment strategies that distribute wealth among m stocks. The in- 
vestment description of a portfolio strategy P is Pt = (Pa,... ,Ptm), where < Pu < 1 and 
Z~27=i Pti = 1- We put a fraction Pti of our wealth in stock i at time t. 

CRP: Constantly Rebalanced Portfolio ]B|]. The parameter space for the CRP strategy is 
W = W m . The investment description is CRP((w) = w: at the beginning of each day, we invest a 
Wi proportion of our wealth in stock i. 



CRP-S: Constantly Rebalanced Portfolio with Side Information. Cover and Ordentlich 
H consider a generalization of CRP. Rather than rebalancing our holdings according to a single 
portfolio vector w S W m every day, we have k vectors wi, . . . ,Wfc £ W m and a side information 
state yt G {1, . . . , k} that classifies each day t into one of k possible categories; on day t we rebalance 
our holdings according to w yt . By partitioning the time interval into k subsequences corresponding 
to each of the k side information states and running k instances of the universalization algorithm 
(one instance for each state), Cover and Ordentlich show that the average daily return approaches 
that of the underlying strategy operating with k optimal parameters, wj, . . . ,w£ E W m , where w| 
is used on days t when the side information state is yt = j. We generalize this further by allowing 
portions of our wealth to be rebalanced according to several of the wj every day. Suppose that 
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the side information is encapsulated in some vector v E Mr, for some I. This vector can contain 
information about specific stocks, such as historic performance and company fundamentals, or 
macro-economic indicators such as inflation and unemployment. Let f = - - - , fk) '■ M e — ► [0, l] k 
be some function satisfying X^Li/j( v ) = 1 f° r an v ^ The parameter space is W^; the 
investment description is CRP-St(wi, . . . , w^) = Ylj=i /?( v *) w j' where v t is the indicator vector 
for day t. Under such a scheme, we have the flexibility of splitting our wealth among multiple 
sets of portfolios wi, . . . , w& on any given day, rather than being forced to choose a single one. 
For example, assume that v is a /c-dimensional vector, with each V{ corresponding to portfolio Wj. 
Define f : M k —> [0,l] k by fi(vt) = „k U — > so that our allocation is biased towards portfolios 

corresponding to higher indicators while still maintaining a position in the others. 

IA[fc]: fc-Way Indicator Aggregation. For each day t > 0, suppose that each stock i has a set 
of k indicators Vu = (vui, ■ ■ ■ , v t ik), where each vuj G (0, 1] and, for 1 < j < k, vtij, ■ ■ ■ , v tm j have 
been normalized such that there is at least one i such that vuj = 1. Examples of possible indicators 
include historic stock performance and trading volumes, and company fundamentals. Our goal is 
to aggregate the indicators for each stock to get a measure of the stock's attractiveness and put a 
greater proportion of our wealth in stocks that are more attractive. We will aggregate the indicators 
by taking their weighted average, where the weights will be determined by the parameters. The 
parameter space is W = Wk and the investment description is 

iA t (w) = ( N „r vti , . . . , N „r vtm ). 

3 Universalization of Investment Strategies 
3.1 Universalization Defined 

In a typical stock market, wealth grows geometrically. On day t > 0, let Xj be the return vector for 
day t, the vector of factors by which stock prices change on day t. The return vector corresponding 
to a trading strategy on a single stock is (x t ,l + where xt is the factor by which the price of 

the stock changes and 1 + 1 ~ Xf is the factor by which our investment in a short position changes, 
as described in Lemma [j]; the return vector corresponding to a portfolio strategy is (xn, ■ ■ ■ , xt m ), 
where xu is the factor by which the price of stock i changes, where 1 < i < m. Henceforth, we do 
not make a distinction between return vectors corresponding to trading and portfolio strategies; 
we assume that xt is appropriately defined to correspond to the investment strategy in question. 
For an investment strategy S with parameter vector w, the return of S(w) during the t-th day — 
the factor by which our wealth changes on the t-th day when invested according to S(w) — is 
St(yv) • x t = YliLi Sti(. w ) " x ti (recall that St(w) is the investment description of S(w) for day t, 
which is a vector specifying the proportion of wealth to put in each stock). Given time n > 0, let 
1Z n (S(\v)) = YYt=o £t( w ) • x t be the cumulative return of S(w) up to time n; we may write TZ n (w) 
in place of 1Z n (S(w)) if S is obvious from context. We analyze the performance of S in terms of 
the normalized log-return £ n (w) = C n (S{w)) = - log7^ n (w) of the wealth achieved. 

For investment strategy S, let w* = argmax wg ]R. 7Z-n(S(w)) be the parameters that maximize 
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the return of S up to day An investment strategy U universalizes (or is universal for) S if[] 

c n {u) = c n (s(K)) - o(i) 

for all environment vectors £ n . That is, U is universal for S if the average daily log-return of U 
approaches the optimal average daily log-return of S as the length n of the time horizon grows, 
regardless of stock price sequences. 

3.2 General Techniques for Universalization 

Given an investment strategy S, let W be the parameter space for S and let \x be the uniform 
measure over W. Our universalization algorithm for S, U(S), is a generalization of Cover's original 
result |J. The investment description Ut(S) for the universalization of S on day t > is a weighted 
average of the St(w) over w £ W, with greater weight given to parameters w that have performed 
better in the past (i.e. 7£t(w) is larger). Formally, the investment description is 

u j s , _ J w ^(w)^ t (w)^(w) = j w S t (w)n t (S(w))alfi(w) ^ 

* /w^( w ) d ^( w ) J w ^( 5 '( w )) d / i ( w ) 

where we take Hq(w) = 1 for all w £ W.Q 

Remark 5 The definition of universalization can be expanded to include measures other than (i, 
but we consider only [i in our results. 



Lemma 3 (|3],|g]) The cumulative n-day return oflA(S) is 

K n (U{S))= [ K n (w)dpL(w) =M(n n (w)), 
Jw 

the ^.-weighted average of the cumulative returns of the investment strategies {5(w) | w £ W}. 

Proof: The return of U(S) on day t is Ut(S) ■ xt, where xt is the return vector for day t. The 
cumulative n-day return of IA (S) is 



K n (U(S)) = [] Ut(S) x t = n r Jf w r \ 
11 11 j w TZt(w)dfi(w) 



t=0 t=0 

n—l r i ci i \ \i-r I \ 7 I \ n— 1 



n /w( g *( w ) • x t )7^(w)d/i(w) ^-j- / w 7tt+i(w)d/i(w) 



J W ^*( W )^( W ) f=o /w^( W ) d ^( W ) 

The result follows from the fact that this product telescopes. ■ 

Rather than directly universalizing a given investment strategy S, we instead focus on a modified 
version of S that puts a nonzero fraction of wealth in each of the m stocks. Define the investment 
strategy S by 

$(W) = (1 " „, * ^o )fl(w) + 



2(t + l) 2; v ; 2m(t + l) 



4 As mentioned above, w* can only be computed with hindsight. 

5 Unlike previously discussed investment strategies, the behavior of U is fully defined without an additional pa- 
rameter vector w. 

6 



Cover's algorithm is a special case of this, replacing St(w) with w. 
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for t > and some fixed < e < 1. Rather than universalizing S, we instead universalize S. 
Lemma || tells us that we do not lose much by doing this. 

Lemma 4 For all n > 0, (1) K n (U{S)) > (1 - e)n n (U(S)) and (2) C n (U{S)) = C n {U(S)) - 2gl. 
(3) IfU(S) is a universalization of S, then U{S) is a universalization of S as well. 



Proof: Statements (2) and (3) follow directly from (1). Statement (1) follows from the 

2(m) 2 



fact that for all w G W, K n (S(w)) = Y\t=o ^( w ) ■ x < > Ut^oi 1 ~ wdw) S t(™) ■ x * > 



(1 " E?=o i^W^n^Cw)) > (1 - e)K n (S(w)). U 
Remark 6 Henceforth, we assume that suitable modifications have been made to S to ensure that 

Theorem 5 Given an investment strategy S, let W = W| (for some k > 2 and £ > 1) be its 
parameter space. For 1 < i < m, 1 < i < I and 1 < j < k, assume that there is a constant c such 



that 



< c(t + 1) for all w £ W. Then W(S') is a universalization of S. 



To prove Theorem || we first prove some preliminary results. 

Lemma 6 For nonnegative vector a and strictly positive vectors b and x, 

min — < < max — . 

i bi b • x i hi 



Proof: Assume that the components of a and b are strictly positive. Otherwise, the lemma holds 

and i min = arg min; 2i 



trivially. Let i max = arg maxj t 1 and i m i n = arg min^ , so that 



— < <^ — — < — — and ~~~ > -7——— 44> — — > 

bi bi a,- bi 



2 max 



Then 



i t mm ' max u-z max 



&i . (xj . + T^—xA b • x h (xi + Y\- T^—Xi) 

( min\ ( min 1 L J ^/^ 111 [ 11 bi . l > 'max V 'max 1 /-^j%^=% n 



'max «i max 

etj . , a x , a,;. 



fc mm ^ ^ 'max 

b • X ~~ &*„„ 



Our next two results are related to the (fc — l)-dimensional volumes of some subsets of M. k . 

Lemma 7 The (k — 1) -dimensional volume of the simplex = {w £ [0, l] fc | Y^i=i w i = 1}> 
defined in §), is 
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Proof: By induction on k, it can be shown that the /c-dimensional volume of the solid Wk{s) = 
{ w I Xa=i w i — s } ^ s XT" Written in terms of the length r of the line segment passing between the 

r\fk. Upon differentiation with respect 
- l)-dimensional volume of the simplex 



origin and (f , . . . , f ) £ M fc , the volume is ^r k k^ since s 



„k lfci 



w fc ( s ) = {w|E- =1 



■Vies*- 1 , we arrive at the (k 



W; 



s}. Setting s = 1 yields the desired result. 



Lemma 8 The (k — 1) -dimensional volume of a (k — 1) -dimensional ball of radius p embedded in 

fc-i , , 



r( 



+ 1 



-, where 



r(£) = (*-l)! and r(£ + |) - , 

Proof: This result is proven in Folland [^, Corollary 2.56]. ■ 

Proof of Theorem |||: ^,From Lemma|||, the return oilA{S) is the average of the cumulative returns 
of the investment strategies {<S(w) | w £ W}. Let w* = arg max we w 7Zn{S(w)) be the parameters 
that maximize the return of S. We show that there is a set B of nonzero volume around w* such 
that for w 6 B, the return lZ n (w) is close to the optimal return 7Z n (w*). We then show that the 
contribution to the average return from B is sufficiently large to ensure universalizability. We begin 
by bounding the magnitude of the gradient vector V7£ n (w). From Remark ^ and our assumption 
in the statement of the theorem, for all w, t, i, t, and j 



dw. 



Sti(w) 



< c'm(t + 1) ; 



where d = ^p. Using this fact and Lemma|, the partial derivative of the return function 7£ n (w) 



1Z n (S(\v)) = YYt=o r t(S(w)) with respect to parameter w L j is 



dw. 



'j 



n-1 



£>(S t (w)-x t ) 

dw Li 



n-1 



< K n (w) £ 

t=0 



dw.j 



■ X t i 



n-1 

< Kn(vr) c'm(t + if < c'TZ n (\v 



t=o 



and 



\Vn n (w)\ < c'K n (w)mn 4 Vkl. 



(9) 



We would like to take our set B to be some d-dimensional ball around w* ; unfortunately, if w* is 
on (or close to) an edge of W, the reasoning introduced at the beginning of this proof is not valid. 
We instead perturb w* to a point w that is at least 



P 



dmrfik^t 



away from all edges, where < 7 < 1 is a constant, and such that 1Z n (w) is close to 1Z n (w*). 
To illustrate the perturbation, let w* = (w*,... , w|) where w* = (w*i,... ,w* k ) and w* ik = 
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1 — Yli=x w *i f° r 1 — 1 — We perturb each w* in the same way. Let = w*. For 1 < j < k, 
given wi -1 , define wi as follows. Let j max be the index of the maximum coordinate of w/ -1 . If 
< ttf?„- < p, define w 3 .* = w 3 ., 1 + p, w]\ = w 3 \ 1 — p and leave all other coordinates unchanged. 

Otherwise, let wj Q = wj~ . The final perturbation is w = (wi,... , w^), where w t = wf. By 
construction, w G W, w is at least p away from the edges of W and \w*j — w L j\ < kp for all i and j. 

We bound ^rfey by the multivariate mean value theorem and the Cauchy-Schwartz inequality: 

7^(w) = ft n (w*)+ft n (w)-7^(w*) 

> TZ n (w*) — \VTZ n (w') • (w — w*)| (for some w' between w and w*) 

> K n (-w*) - \VH n (w')\ • |w - w*| > TZ n (w*) - cX(w>nW • kpVU 

> K n {w*) - c'K n {w*)mn 4 Sk£ ■ kpVkl > K n (w*)(l - 7). 

For < 1 < I let C L = {w t G M. k | |w t — w t | < p}. From the construction of w, B L = C t D V\4 is 
a (k — l)-dimensional ball of radius p. Let w* = argmax we s t 1Z n (\v) and let w* = (w*, ... , w^) 
be the profit maximizing parameters in B = B\ x • • • xB|. For w G B, 

K n (w) = K n (w*)+K n (w)-K n (w*) 

> lZ n (w*) — \VTZ n (w')\ • |w* — w| (for some w' between w* and w) 

> TZ n (w*) - c'TZ n {w*)mn 4 Vki • 2p\Tl > K n (w*)(l - 7) 

> ft n (w*)(l-2 7 ). 

By Lemma ^ 

TZ n {U{S)) = I K n (S(w))dp(w) > I K n (w)dp(w) > (1 - 2 7 )7e n (w*) / dfj,(w) 
Jw Jb Jb 

> (l-2 7 )7e n (w*)i^ 
Jw aw 

/ ^ k-l ( k _ 1U\ 1 

= (l-2 7 )TC n (w*) I -^£—.L-—t 1 (from Lemmas and |D 

= ft n (w*)A( 7 , m,M)n- 4H 
where A is some constant depending on 7, m, k, and Therefore, 

£ n (w*) - C n (U(S)) < l ^M%m,k,t) + ^logn = o(r») 

n n n 

as claimed. ■ 

Remark 7 The techniques used in the proof of Theorem |5| can be generalized to other investment 
strategies with bounded parameter spaces W that are not necessarily of the form Wi- 



ll 



3.3 Increasing the Number of Parameters with Time 

The reader may notice from the proof of Theorem |5] that an investment strategy S may be uni- 
versalizable even if the dimensions of its parameter space W grow with time. In fact, even if the 
dimension of the parameter space (the coefficient of in (|l(])) is 0{- 



^(n)iogJ ' whCTe iS a 

monotone increasing function, the strategy is still universalizable. This introduces an interesting 
possibility for investment strategies whose parameter spaces grow with time as more information 
becomes available. As a simple example, consider dynamic universalization, which allows us to 
track a higher-return benchmark than basic universalization. Partition the time interval X = [0, n) 
into ij) = 0( ^ n ^" ogn ) subintervals X\, . . . and let Wj be the parameters that optimize the re- 
turn during Xj . In X\ , we run the universalization algorithm given by (|8|) over the basic parameter 
space W of S. In I2, we run the algorithm over W x W; to compute the investment description for 
a day t G X2 using (||), we compute the return 7^(wi, W2) as the product of the returns we would 
have earned in X\ using wi and what we would have earned up to day t in X2 using W2. We proceed 
similarly in intervals I3 through 2^. This will allow us to track the strategy that uses the optimal 
parameters wj. corresponding to each Xj. Such a strategy is useful in environments where optimal 
investment styles (and the optimal investment strategy parameters that go with them) change with 
time. 



3.4 Applications to Trading Strategies 



By proving an upper bound on 
versalizable. 



ar t i(w) 

dw-i 



for our trading strategies T, we show that they are uni- 



Theorem 9 The moving average cross-over trading strategy, MA[k], is universalizable for the 
long/short allocation functions git){x) and ge(x) defined in and respectively. 

Proof: The parameters for MA [k] are of the form wp = (wfi,--- ,WF(k~i)^~ w Fi — ---—w F (k-i)) 
and wg = (wsi, ■ ■ ■ ,i^s(k-i)i 1 — w si — • • • — ^s(fe_i))- Using the long/short allocation function 
g(t)(%) defined in (||), the partial derivative of the investment description with respect to a parameter 
WFj (or similarly wsj) is 



dMA ti (w F ,ws) 



dw 



dg((w F - wg) • v t ) 



dw 



Fj 



< g " fa ~ v tk) < 2 



where 1 < j < k and i £ {1,2}. Similarly, we can show that using the long/short allocation 
function g e (x) defined in (|), 9MA fj/' Ws) < \. ■ 



Theorem 10 The support and resistance breakout trading strategy, SR[k], is universalizable for 
the long/short allocation functions hm(x,y) and h p (x,y) defined in ^) and (0) respectively. 

Proof: We arrive at the result by differentiating the long/short allocation functions h^(x, y) and 
h p (x,y) with respect to an arbitrary parameter Wj and showing that the partial derivative is 0(t), 
as in the proof of Theorem ||. ■ 
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3.5 Applications to Portfolio Strategies 

Theorem 11 The constantly rebalanced portfolio, CRP, and CRP with side information, CRP-S, 
portfolio strategies are universalizable. 

Proof: The partial derivatives of CRPtj and CRP-Stj with respect to an arbitrary parameter Wj 
are at most 1. ■ 

Theorem 12 The k-way indicator aggregation portfolio strategy, IA[k], is universalizable. 

Proof: First, we show that YlT=i w ' v « — \ f° r au Since Ylj=i w j = 1> there exists jo such 

that w jo > i. Then YIt=i w ' v te > XX l w h ' v t£j ^ \ YT=i v t£jo ^ f since the { v Uj }i<i<m have 
been normalized such that there is at least one £q such that vte j = 1. 

Now, let S = IA[A;]. By Theorem [|, we need only show that d^j w ) = 0(t), for 1 < j < k — 1. 

For t > and 1 < i < m recall that SVj(w) = ^nT' Vt ' — . Then, for 1 < j < k — 1, since 

w = (wi,... ,w fc _i,l — (u>i H hwt-i)), 

dS ti {w) vtij-vuk w-v t ,j . 



<9^i Z)^=i w • v « (XXi w - v «) 

1 m 2 



as we wanted to show. 



4 Fast Computation of Universal Investment Strategies 
4.1 Approximation by Sampling 

The running time of the universalization algorithm depends on the time to compute the integral 
in (|8]). A straightforward evaluation of it takes time exponential in the number of parameters. 
Following Kalai and Vempala fl3|| , we propose to approximate it by sampling the parameters 
according to a biased distribution, giving greater weight to better performing parameters. Define 
the measure £t on W by 

n t (S(w)) 



Lemma 13 ([[13|]) The investment description Ut{S) for universalization is the average of St(w) 
with respect to the Q measure. 

Proof: The average of S'i(w) with respect to Ct is 

E we(w,Ct)( 5 'i( w )) = / S t (w)d(t(w) 

iw 

Jw J w K t {S{w))du.(w) 
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where the final equality follows from 



In Section 4.2, we show that for certain strategies we can efficiently sample from a distribution 
Ct that is "close" to Q, i-e. given 7$ > 0, we generate samples from ( t in C(k>g^-) time and such 
that 



JW 



w)-Ct(w)|dA*(w) <lt- (11) 

— 2 

Assume for now that we can sample from ^, with 7^ = 4m ^ +1 )4 > where e is the constant appearing 
in Remark ||. Let U t (S) = L, 5j(w)(iCt(w) be the corresponding approximation to U{S). Lemma 14 
tells us that we do not lose much by sampling from 

Lemma 14 For all n>0, (1) K n (U{S)) > (1 - e)K n (U(S)) and (2) ifU(S) is a universalization 
of S, then U(S) is a universalization of S as well. 

Proof: Statement (2) follows directly from (1). To see (1), we need only show that the fraction of 
wealth we put in each stock i on day t under U (S) is within a 1 — 2 (t+i)' 2 f ac t° r of the corresponding 
amount under 1/(5), i.e. U ti {S) > (1 - 2 {t+l)^ U t^ S ) for < i < n and 1 < i < m. For w E W, let 
7t(w) = |Ct(w) - Ct(w)|, so that J w7t (w)(iw = j t < 4m( ^_ 1)4 ■ We have 

Uu{S) = / 5 t i(w)Ct(w)d/i(w) > / 5«(w)(Ct(w) - 7 t (w))d//(w) 

JW JW 

= Uu(S) - / 5«(w)7t(w)d//(w) > Wfj(5) - 7* (since 5*»(w) < 1) 
iw 

^ ( X - 2(I+iF)^( S ') ( SmCe ^( S ') ^ mill w ^(w) > 2m(t+iy and ^ - 4m(ttl) 4 )' 

as we wanted to show. ■ 

By sampling from Q, we use a generalization of the Chernoff bound to get an approximation 
U{S) to U(S) such that with high probability Uu(S) > (1 — 2 (t+iy 2 )^ti{S) for < t < n and 
1 < i < m. Using an argument similar to that in the proof of Lemma [l4], we see that if U(S) is a 
universalization of 5, then such a IA(S) is a universalization of S as well. Choose wi, . . . , w/y t E W 
at random according to distribution £j and let Uu{S) = jj- YmIi 5ti(wj). Lemma [l^ discusses the 
number of samples Nt required to get a sufficiently good approximation to Ut(S). 

Lemma 15 Given < 5 < 1 use N t > 8m ^ 4 +1 ) i Q g 2m(t+i) sani pj es £ Q compute Ut{S), where e 
is the constant appearing in Remark ||. With probability 1 — 6, Uu{S) > (1 — 2(t+i) i )^ti{S) for all 
1 < i < m and t > 0. 

Proof: Hoeffding ]l2| proves a general version of the Chernoff bound. For random variables < 
Xi < 1 with EpQ) = n and X = fa YaLi X * the bound states that Vx{X < (1 - a)//) < e - 2iVa! V. 
In our case, we would like £/« > (1 — 2 (t+i)' 2 ffiti- As this must hold for 1 < i < m and t > with 
total probability 1 — 5, we require Pr(W« < (1 — 2 (t+i)^ — 2m(if+i) :j ^ or eacn * an d From our 
assumption stated in Remark ^, fx = Uu > 2m{t+\) i ana - ^ ne desired probability bound is achieved 
with Nt > 8m2 f 4 +1)8 log^±^ samples. ""' ■ 
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4.2 Efficient Sampling 

We now discuss how to sample from W = Wf = x • • • x VA4 according to distribution Q(-) oc 
1Zt{') = T^-t{S{-))- W is a convex set of diameter d = \f2~t. We focus on a discretization of the 
sampling problem. Choose an orthogonal coordinate system on each V\4 and partition it into 
hypercubes of side length 5t, where St is a constant chosen below. Let f2 be the set of centers of 
cubes that intersect W and choose the partition such that the coordinates of w 6 are multiples 
of 5t- For w G fl, let C(w) be the cube with center w. We show how to choose w G with 
probability "close to" 

In particular, we sample from a distribution 7fj that satisfies 

Am(t + l) 4 ' 



EKw)-Mw)|<7,= f (12) 



wen 

Note that this is a discretization of (|Tl|). We will also have that for each w € O,, 

ff t (w) 



7T t (W) 



< 2. (13) 



We would like to choose 6t sufficiently small that lZ t is "nearly constant" over C(w) i.e. there is a 
small constant v > such that 

(1 + ^^(w) < T^(w') < (1 + v)Kt(w) (14) 

for all w' € C(w). Such a 5t can be chosen for investment strategies S that have bounded derivative, 
as we see in Lemma |l6[ 

Lemma 16 Suppose that investment strategy S satisfies the condition for universalizability given 
in Theorem [|, i. e. 
proof of Theorem 



< ct. Given v > 0, let 8t = 8t{ v ) = z^^ke ' w ^ ere c ' 1S defined in the 
For w, w' G W such that \uiij — u>^-| < 8t{v) for all 1 < i < £ and 1 < j < k, 



(1 + z^)- 1 ^(w) < T^(w') < (1 + v)Kt{yr). 



Proof: Note that |w — w'| < StvM. Let w* be the parameters that maximize the return on the 
line between w and w'. By the multivariate mean value theorem and the bound for |V7£f| given 
in (1), 

< 1Zt(w) + \VJZt(w m )\ ■ |w — w*| (for some w m between w* and w) 



< Kt(w) + c'nt(w m )mnWk£ ■ 5 t Vk£ < TZ t {w) + 7^(w*)- 

n t (w) > n t (w*)(i-^)>n t (W)(i-^) 

so that lZt(w') < (1 + ^)7£j(w). By similar reasoning, 

ftt(w') = 7^(w*)+7^(w')-^(w*) 

> TZt(w*) — |V7^t(w m )| • |w' — w*| (for some w m between w* and w') 

> 7^(w*)(l - V -)> 7^(w)(l - V -)> 7Zt(w)(l + u)~\ 
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completing the proof. 



We use a Metropolis algorithm [14] to sample from Tr t . We generate a random walk on Q 
according to a Markov chain whose stationary distribution is Begin by selecting a point wo € SI 
according to either vfj_i or vr £ 2 50 Remark |8] explains how to do this. 

Remark 8 We can select a point according to 7Tt_i by "saving" our samples that were generated 



at time t — 1. By Lemma 15, we would have generated Nt—i > 8m 4 * 8 l Q g 2n l t samples at time 



t — 1, which is not enough to generate the Nt > 8m ^ 4 +1 - > log 2m ( t + 1 ) samples necessary at time t. 
Instead, we can "save" samples that were generated at times t — 1 and t — 2. For sufficiently large 
t, N t < Nt-i + N t -2 and our initial point wq would be picked according to either lit-i or 7ft-2- As 



we see in the proof of Lemma 22, this distinction is not important. 



If w T is the position of our random walk at time r > 0, we pick its position at time r + 1 as 
follows. Note that w r has 2(k — 1)1 neighbors, two along each axis in the Cartesian product of £ 
(k — 1) -dimensional spaces. Let w be a neighbor of w r , selected uniformly at random. If w E f2, 
set 



W T+ l 



w with probability p = min(l, 
w T with probability 1 — p. 



If w g" n,, let w r +i = w T . It is well-known that the stationary distribution of this random walk is 
■Kf We must determine how many steps of the walk are necessary before the distribution has gotten 
sufficiently close to stationary. Let p T be the distribution attained after r steps of the random walk. 
That is, p T (w) is the probability of being at w after r steps. 

Remark 9 A distinction should be made between t and t. We use t to refer to the time step in 
our universalization algorithm. We use r to refer to "sub" time steps used in the Markov chain to 
sample from iif When t is clear from context, we may drop it from the subscripts in our notation. 

Applegate and Kannan [Q] show that if the desired distribution nt is proportional to a log- 
concave function F {i.e. \ogF is concave) the Markov chain is rapidly mixing, reaches its steady 
state in polynomial time. Frieze and Kannan || give an improved upper bound on the mixing time 
using Logarithmic Sobolev inequalities M. 



Theorem 17 (Theorem 1 of ]|9|) Assume the diameter dofW satisfies d> 5tVk£ and that the 
target distribution tt is proportional to a log-concave function. There is an absolute constant k > 
such that 



, . , N V _^r*?, 1 Mir e k£d 2 
- 1 /_. f( w ) -Pr(w)| <e^log 1 -2 — , (15) 

\wen / 1 

where 7r* = min we nvr(w), M = max wg n log > Po(-) is the initial distribution on Q, 
= E we n e and Q. e = {w £ U \ Vol(C(w) n W) < Vol(C(w))} (the "e" in the subscripts of 

■K e and Q e stands for u edge"). 

7 Ideally, we would like to begin with a point selected according to itt-i, but, as discussed in Remark i, this is not 
always possible. 
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In the random walk described above, if w T is on an edge of f2, so it has many neighbors outside 



Q, the walk may get "stuck" at w T for a long time, as seen in the "7r e " term of Theorem |17. We 
must ensure that the random walk has low probability of reaching such edge points. We do this by 
applying a "damping function" to TZt that becomes exponentially small near the edges of W. For 
1 < i < £, 1 < j < k, and w = (w b ... , w e ) = ((u>n, . . . ,w lk ), ... , (w ei , . . . ,w ek )) G W let 

where a > and T > 2 are constants that we choose below, and let 

I k 

F t (w)=^(w)nn^( w )- 

i=ij=\ 

Lemma 18 Ft is log-concave if and only if TZt is log-concave^ 

Proof: This follows from the fact that log-concave functions are closed under multiplication and 
the fact that log/jj(w) = rmin(— a + u>ij,Q), which is concave. ■ 

Choose a = where St(-) is defined in Lemma 16 and 74 is defined in (|l2"|). Let (f oc F t 

be the probability measure proportional to Ft- We need to show that for our purposes, sampling 



from £p is not much different than sampling from By Lemma 14, we can do this by showing 
that J w |Ct( w ) — CF(w)|dw < 7 t , which we do in Lemma 1£. 

Remark 10 Before continuing, we show how W can be scaled, which will be useful in future proofs. 
Take p = (±, . . . , ±) G W fc ; given X G (-1, 1), let 

w (x) = ( 1 + x )( w _p) +p 

and let 

w (x) = | W ( X ) j w e w fc } 

be a scaled version of Wk about p, where the scaling factor is 1 + X- To extend this scaling to 
W = Wf, given w = (w b ... ,w { )eW, let = (w{ x) , . . . , wj x) ) and let 

W (x) = {w (x) I w G W}. 

A fact we use is that for 1 < i < £, 1 < j < k, and w = (wi, . . . , w^) G W 

1 1 

\ W if ~ W ij\ = K 1 +X)(Wij - t) + T ~ w ij\ < |X|- 

Lemma 19 J w |Ct(w) - CF(w)|dw < j t . 

8 We characterize investment strategies for which TZt is log-concave in Theorem Eq 
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Proof: Let W = W^~ k ^ be the "scaled -in" version of W, as defined in Remark [l(| By Lemma [l6| , 
since \w{j — v/^\ < ka = ^t(-f-) for all i and j, T^j(w') > 1+ 1 21 7^t( w ) and 

f K t (w)dw > [ K t (w)dw. (17) 

iff 1 + 2" JW 

Let W eg = {w G W | F t (w) = 7^(w)} be the subset of W where F t (-) and ft t (-) are equal; W C W eg 
since, by construction of w', w'^ > a for all i and j. Let W + = {w G W | Cf(w) > Ci( w )} be the 
subset of W where CfQ is at least Ct(0 and let W_ = W - W+. We bound 

/ ICf(w) - Ct(w)|dw = / (Cf(w) - Ct(w))dw + / (Ct(w) - Cf(w))c/w 
by bounding J w — (p), which also gives a bound for J* w+ (Cf — Ct)j since 

/ (CF-Ct) = U-[ Cf)-(i-[ Ct)=[ (Q-Cf). 



Since F t < K u f w F t < f w K t and Cf(w) = > = CM for w G W eq ; thus W C W eq C 

W+ and W-CW-W. We have 



/ (Ct(w) - (f w dw < / Ct(w)dw = r = 1 - " 

< l-^_<^ 
" 1 + f " 2' 

where the second- last inequality follows from (|l7|). This completes the proof. ■ 

Henceforth, we are concerned with sampling from W with probability proportional to Fs(-). We 
use the Metropolis algorithm described above, replacing Rt(-) with Fj(-); we must refine our grid 
spacing St so that (|T^) is satisfied by F t ; let $ be the new grid spacing. 

Lemma 20 Suppose that the conditions of Lemma It are satisfied. Given v > 0, let S' t {y) = S' t = 
WdmFM = ^*(r)> w ^ ere r appears in ([I^j. For w,w' G W such that — u;J-| < <5j(f) for all 
1 < j < t and 1 < j < fe, (1 + z^)~ 1 F t (w) < F(w') < (1 + u)F t (w). 

Proof: By Lemma[l6|, lZt(w) and 7£t(w') differ by at most a factor 1 + For each i and j, /y (w) 
and fij(w') differ by at most a factor e r5 * M and hence nLillj=i A?( w ) and IlLi llj=i fij( w ') 
differ by at most a factor e^^tM = es^'mt 4 . Hence, for T > 2 and sufficiently large t, Ft(w) and 
Ft(w') differ by at most a factor 1 + v. ■ 



17| to select r so that the resulting distribution p T satisfies 
with p T in place of 7ft and F t in place of TZt- We begin 



We are now ready to use Theorem 
(HD (Theorem ||) and © (Theorem |5 
with some preliminary lemmas. 

Lemma 21 There is a constant > such that log ^- < fc£T cr + fcOog y + t log where £ is 

dehned in Remark || 
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Proof: Take (3 such that the number of points in Q is at most (Jr)^ For wi,W2 G £1, the 
ratio of single-day returns on day if using wi and W2 is 

5f(wi) • e 



5 t /(w 2 ) -x t , " 2m(t' + l) 2 
by Remark |6| and Lemma ||. The ratio of the cumulative returns up to day t is 



> 



7^(w 2 ) ~ \2mt 2 J ' 

and thus ^ ^^( w ) — (^jr)*- Factoring in the maximum dampening effect of the fy, 

7T* > e -^( s l)(k-i)l and i g J_ < fcflv + ^log §■ + Hog ■ 



Lemma 22 M < 4 (M^il!) i og M^ili. 

Proof: As stated in Remark [8|, the initial distribution is either po = itt-i or irt-2- It turns out 
that the worst case happens when pq = 7ff_ 2 - For all w £ fi, ^~^7^ < 2 b; 

7r t _ 2 (w) F t _ 2 (w) Ewen^( w ) 



^( w ) E we n- F *-2(w) Ft i 



w 



F f - 2 (w) F(w') _ , F t (w) 

< — — • — — — ( by Lemma y, where w = arg max 



F(w) F t _ 2 (w') w u ' & we cF_ 2 (w) 

ft t _ 2 (w) ^t(w') 



K t (w) Kt-2(vr') 



(since the {fij{-)}i,j remain constant with time) 



(StW-xtKSt-iW-xt-i) ( 2m(t + l) 2 \ 2 
(5 t (w)-x t )(5 t _i(w)-x t _ 1 ) " V e / 



where the final inequality follows from the discussion in the proof of Lemma 21. This proves the 
result since = **- 2(w) . ■ 

7T t (w) 7T t -2(w) 7Tt(w) 



Lemma 23 vr e < (1 + i/) 4 (l + : 2 i )e- r<T , where v appears in the definition of 5' t in Lemma 



It 



appears in (12), and T and a appear in 
Proof: Extend our 5^-hypercube partition of W to the hyperplane containing W and let $ be the 



set of centers of the hypercubes in this extended partition. For K C R , let $ k be the set of grid 
points w € * such that C(w) n K j= 0, so that 17 = *w By Lemma ||, for F C W, 

— - — V F t (w)Vol(C(w) DK) < f F t (w)dw < (1 + v) V F t (w)Vol(C(w) n K). (18) 

Using the notation of Lemma [H| let W = W^~ ka ^ be a "scaled-in" version of W; we showed in 
Lemma || that for w G W, F(w) = TZ t (w) and that 

/ F(w)dw = f TZt(w)dw > — ^ [ K t (w)dw. (19) 
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Let W" = W^^)) be a "scaled-out" version of W and extend the domains of F t (-) and H t {-) to 
W" by defining F t (w") = F t (w") and TZ t {w") = K t (w") for w" G W" - W, where w" is the point 



where the line between w" and p = (p, ... , p) G W intersects the boundary of W. By Lemma 20 
and the construction of the extension of TZt, TZt(w") < (1 + u)TZt{w) and 

/ Ht(yr)dw < + / n t (w)dw. (20) 

By construction of W", C(w) C W" for w G f2 e ; from the definition of Ft and the choice of 5' t , 
-Pl(w) < (1 + ^)e _r<T 7^(w) for w G O e . Using these facts, 

E wen . fl(w) < aj*- 1 )' (i + ^EwE^iw) 

„ Ew P * Voi(C(w)nw")^(w) 1V 

< (! + ^" r v vi^ nW^Y (since Vol(C(w)) = 5 fe -^) 

< {1 + u)e -r ^ ;J r w F /\; (by©) 

< (1 + ,) 3 e- r - ^ ^(w)rfw ^ + g (by(g)and@) . 



Remark 11 We simplify notation below by using notation, which ignores logarithmic and 

constant terms. For our purposes, /(•) = 0* (#(•)) if there exists a constant C > such that /(•) = 
0(g(-) log (kimt/s)). The values derived above in this notation are jt = C*("^)> $t = C , *( mt 4 fc ^ ), 
a = 0*(^FPl), = 0*(raz)' lo g^ = 0*(k£T* + t), M = 0*(*££), andn e = 0*(e~ r "). 

Theorem 24 Letting T = 0*(~) = 0*( m -^^) ! the random walk reaches a distribution tt that 
satisfies @ after r = 0*{ k7 ^fjT ) steps. 

Proof: We show how to bound the right-side of (|l5|), where the grid spacing St has been replaced 

flTekl 

— F2 

KS' t 



by 8'f The second term, Mne ^ d ; cari be made exponentially small in T by choosing T = 0*(^). The 



t /2 



_ KTd i i 

value of r stated in the theorem is large enough to make the first term, e «?" log — , exponentially 
small in r. ■ 

Theorem 25 Suppose that the distribution p TQ obtained after To steps satishes 

|vr(w) -p ro (w)| < j t . 

wen 

After Tq > - — j°_ lo j_ log ^- = 0*(To(k£ + t)) steps, the resulting distribution p T ^ satisfies 



P^(W) 

max — p — 1 < 1, 

wen 7r(w) 



which implies ffi 
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Proof: Let d(r) = \ Ewen K( w ) ~Pt(w)| and d(r) = max weS i - 1 so that d(r ) < ^7*. 

Aldous and Fill prove Equat ions (5) and (6)] that if r > 4 log — , then d(r) < 1, where 
7T* = mm we Q-K t (w) is as defined in the statement of Theorem [17] and A is the second-largest 
eigenvalue of the steady-state transition matrix P of Wf 



To prove the bound on Tq, we show that A > 



TQ -log ■ 



7i 



1 



log 



TO TO 

appealing to a result from Sinclair 16, Proposition 1 (i)], which states that 



-+log^- 

T - We do this by 



r < 



log £ + log i 
T^A 



Solving for A yields the bound for Tq. The 0*{ ) bound comes from the fact that IV = 0*(1) and 
that log — and log — are low-order terms relative to the rn obtained in Theorem |24|. ■ 



4.3 Application to Investment Strategies 

The efficient sampling techniques of this section are applicable to investment strategies S whose 
return functions lZ n (S(-)) are log-concave. Theorem 26 and Corollary 27 characterize such func- 
tions. 

Theorem 26 Given investment strategy S, suppose that for all parameters Wi and Wj, q°.q w = 0. 
Then 1Zt(~w) = 1Zt(S(w)) is log-concave. 

Proof: Let r t (w) = St( w ) • x 4 , so that 1Z n (w) = nr==o lr *( w )- Since log-concave functions are 
closed under multiplication, we need only show that rt(w) is log-concave. The gradient vector of 
logrj(w) has i-th element 91o |^ t ^ w ^ = an d the matrix of second derivatives has (i, j')-th 

element 

1 drt(w) drt(w) 1 d 2 rt(w) 1 drt(w) Or^(w) 

rj(w) 2 dwi duij rt(w) dwidujj rt(w) 2 dwi dwj 

since gj.g^ = Y^[L i ^w-dw^ ' x ti = ^ by assumption. The matrix of second derivatives is negative 
semidefinite, implying that logr^w) is a concave function. ■ 



Corollary 27 Universalizations of the following investment strategies can be computed using the 
sampling techniques of this section. 

1. The trading strategies MA [A;] and SR[k] with long/short allocation functions ge(x) and h p (x, y) 
respectively; and 

2. The portfolio strategies CRP and CRP-S. 

Proof: The result follows from a straightforward differentiation of the investment descriptions of 
these strategies. ■ 

9 Strictly speaking, this result pertains to A max , the second- largest absolute value of the eigenvalues of P, but as 
Sinclair discusses Page 355] the smallest eigenvalue is unimportant, as P can be modified so that all eigenvalues 
are positive without affecting mixing times beyond a constant factor. 
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5 Further Research 



We have introduced in this paper a general framework for universalizing parameterized investment 
strategies. It would be interesting to see whether the proof of Theorem |5| can be optimized so that 
existing universal portfolio proofs for CRP H ||,|6) are a special case of Theorem [j| These proofs 
not only prove that £ n (U(CKP)) converges to £ n (CRP(w* )), but also prove a bound on the rate 
of convergence, 

ft n (CRP(w;)) /n + m-l\ , , 



fn + m — 1\ 
\ m — 1 J 



K n (U (CRP)) 

It would also be interesting to study other trading and portfolio strategies that fit in our univer- 
salization framework and to see how our universalization algorithms perform in empirical tests. 
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